WORLD INTELLECTUAL PROPERTY ORGANIZATION 
International 3ureau 




PCX 

INTERNATIONAL APPUCATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT) 



(51) Intematloiuil Patent ClassificatioD ^ : 
G06F9/38 



Al 



(11) International Publication Number: 
(43) International Publication Date: 



WO 97/25671 

17 July 1997(17.07.97) 



(21) International AppUcaUon Number: PCr/US96^2004S 

(22) Intemationat Filing Date: 20 December 1996 (20.12.96) 



(50) Priority Data: 
OS/583.193 



4 January 1996(04.01.96) 



US 



(71) AppUcanf : ADVANCED MICRO DEVICES, INC. [US/US]; 

Mail Stop 562, S204 East Ben White Boulevard, Austin, TX 
78741 (US). 

(72) Inventor: IRETON, Mark, A.; 6005 Roxbury Lane, Austin, TX 

78739 (US). 

(74) Agent: MILLER, Louise, K.; Advanced Micro Devices. Inc.. 
S204 East Ben White Boulevard, M/S 562. Austin. TX 
78741 (US). 



(81) Designated States: JP, KR. European patent (AT, BE, CH. DE, 
DK, ES, FI, FR, OB, OR. IE, IT, LU, MC, NL. PT, SE). 



Published 

With international search report. 



(54) TiUe: A SUPERSCALAR MICROPROCESSOR INCLUDING A SELECTIVE INSTRUCTION REROUTING MECHANISM 



Regitfcr 
Foe 



Untl 



bntiuctifln 
Reroute 
Uhil 



Exeente 
Unit 



-142 , 



Ufut 

ZT 



Bw 
IntcfteaUml 



(57) Abstract 



A superscalar microprocessor is provided that includes a plurality of execution units each configured to execute the same subset of 
instiuctions. The subset of instiuctions may include arithmetic instructions and instructions optimized for perfomiing DSP functionality. 
Instructions are routed to each of the execution units fh>m an instnK:tiGn decode unit Each execution unit includes a plurality reservation 
stations for storing the instructions awaiting execution. The superscalar microprocessor advantageously includes an instruction reroute unit 
configured to detennine whether a pending instruction within a reservation station of a particular execution unit must wait for more than 
a predetermined number of clock cycles before the execution unit can begin its execution. Upon delecting that a pending instruction will 
need to wait more Uian the predeteimined number of clock cycles before its execution can beghi, tiie instruction reroute unit transfers the 
instruction to another execution unit which is not incuning an execution bottleneck condition. 
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TITLE: A Superscalar Microprocessor Including A Selective Instruction Rerouting Mechanism 
BACKGROUND OF THE INVENTION 

5 1. Field of the Invention 

This invention is related to the field of microprocessors and, more particularly, to superscalar 
microprocessors including multiple execution units that are optimized for performing DSP functions. 

10 2. Description of the Relevant Art 

Superscalar microprocessors achieve high performance by simultaneously executing multiple instructions 
in a clock cycle and by specifying the shortest possible clock cycle consistent with the design. As used hereui, the 
term "clock cycle" refers to an interval of time during which the pipeline stages of a micropiocessor perform then* 
15 intended functions. Memory elements (such as registers and arrays within the microprocessor) capture data values 
according to a clock signal which defines the clock cycle. For example, memory elements may capture their data 
values based upon a rishig or fhlling edge of the clock signal 

Superscalar microprocessor manufacturers often design microprocessors according to the x86 
20 microprocessor architecture. Due to the widespread acceptance in the computer uidustry of the x86 microprocessor 
archtteccure, superscalar microprocessors designed to execute x86 instructions may be suitable for use in many 
computer system configurations. The x86 instruction set is an example ofa complex instruction set computer 
(CISC) instruction set Certain CISC instructions are defined to perform complex operations which may require 
multiple clock cycles to complete. For example, a CISC instruction may utilize a memory operand (i.e. an operand 
25 value stared in a memory location as opposed to a register). Fetching the operand from memory may require 

several clock cycles prior to execution of the instruction upon the operand vahte. Additionally, a CISC instruction 
may specify several results to be stored in several different storage locations. Since execution units within a 
superscalar microprocessor are capable of conveying a finite number of results during a clock cycle, these several 
results add complexity. The number of results an histruction specifies may affect the number of clock cycles 
30 required to execute die instruction. Finally, certain mathematical X86 instructions such as divide and multiply 
instructions may take numerous processor clock cycles to execute, particularly if they involve memory operands. 

Computer systems employing x86 microprocessors also often employ discrete digital signal processors 
(DSPs). The DSPs are typically included within multimedia devices such as sound cards, speech recognition cards, 
35 video capture cards, etc. The DSPs function as coprocessors, performing complex mathematical computations 
demanded by multimedia devices and other signal processing applications more efficiently than general purpose 
microprocessors. Microprocessors are typically optunized for performing integer operations upon values stored 
widiin a main memoiy of a computd' system. While DSPs perform many of the multimedia functions, the 
microprocessor manages die operation of the computer system and executes the spplication programs. 

40 
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Digita] signal processors include execution units which comprise one or more arithmetic logic units 
(ALUs) coupled to hardware niulttpliers which implement complex mathematical algorithms in a pipelined manner. 
The instruction set primarily comprises DSP-type instructions (i.e. instructions optimized for the performance of 
complex mathematical operations) and also includes a small number of non-DSP instructions. The non-DSP 
instructions are in many ways similar to instructions executed by microprocessors, and are necessaiy for allowing 
the DSP to function mdependent of the microprocessor. 

The DSP is typically optimized for matfaematica] algorithms such as correlation, convolution, finite 
impulse response (FIR) filters, infinite impulse response (IR) filters, Fast Fourier Transforms (FFTs), matrix 
correlations, and inner products, among other operations, hnplementations of these nuthematica] aigori&ms 
generally comprise long sequences of systematic aridunetic/muUipUcative operations. These operations are 
interrupted on various occasions by decision-type commands. In general, the DSP sequences are a repetition of a 
veiy small set of instructions that are executed 70% to 90% of the time. The remaining 10% to 30% of the 
instructions are primarily boolean/decision operations. 

As computer systems include more multimedia devices and o^bilities, the mathematical computation 
performed withm the computer system also increases. While computer systems have evolved to include multimedia 
functions, microprocessor performance has continued to increase. Still further, the number of transistors included 
within microprocessor designs continues to increase ivith contiimed hnprovements in semiconductor ftbrication 
technology. It is desirable to integrate DSP functionality into the microprocessor to handle the increased 
computational demands of modem computo* systems and to simplify programming. 

However, as stated previously, DSP functions tend to require extensive mathematical computations. The 
instructions involved in these computations may each require nmncrous clock cycles for execution. If a general 
purpose superscalar microprocessor is employed to handle the DSP flmctionality, and particularly if tiie superscalar 
microprocessor employs distribmed reservation stations, bottlenecks can occur if one of the execudon units is 
burdened with a majority of the Instructioos that require numerous cycles for completion. Thb condition can 
further cause other execution units to stall. 
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SUMMARY OF THE INVENTION 

The problems outlined above are in targe part solved by a superscalar microprocessor employing a 
selective instruction rerouting mechanism in accordance with the present invention. In one embodiment, a 
5 superscalar microprocessor is provided that includes a plurality of execution units each configured to execute the 
same subset of instructions. The subset of instructions may include arithmetic instnictions and histructions 
optimized for perfoiming DSP ftmctionality. Instnictions are routed to each of the execution units from an 
instruction decode unit Each execution unit includes a plurality of reservation stations for storing the insuuctions 
awaiting execution. The superscalar microprocessor advantageously includes an instniction reroute unit configured 

1 0 to determine whether a pending instruction within a reservation station of a particular execution unit must wait for 
more than a predetermined number of clock cycles before the execution tmit can begin its execution. The number 
of clock cycles before the execution unit can begm execution is determined by the number of cycles remaining for a 
currently executing instruction to complete. The determination may fiuther account for the number of cycles 
required to complete any other instructions in the reservation stations of the execution unit that are eligible for 

1 S execution and that are ahead of the pending mstruction with respect to the program order. Upoii detecting that a 
pending instniction will need to wait more than the predetermined number of clock cycles before its execution can 
begin, referred to as an execution *^ttleneck** condition, the instruction reroute logic transfers the insuuction to 
another execution imit which is not incurring an execution bottleneck condition. In accordance, the burden upon 
the first execution unit is decreased, instructions mi^ be executed more expeditiously, and stalling conditions of die 

20 second execution unit mi^ be avoided. 

Broadly speaking, the present invention contemplates a superscalar microprocessor comprising a first 
execution logic circuit configured to execute a predetermined set of instructions, a first reservation station unit 
. coupled to the first execution logic circuit and configured to store a peiuling instruction to be executed by the first 

25 execution logic circuit, a second execution logic circuit configured to execute the predetermined set of insuiictions, 
and a second reservation station unit coupled to the second execution logic circuit The superscalar microprocessor 
fiirther includes an instruction reroute unit coupled to the first and second execution logic tmits arul configured to 
reroute the pending instruction to be executed by the first logic unit to the second reservation station unit in 
response to the mstruction reroute unit determining that the pending instruction must wait for more than a 

30 predetermined number of clock cycles before the pending insduction can begin execution within the first execution 
togicunit 
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BRIEF DESCRIPTION OPTHE DRAWINGS 

Other objects and advantages of the invention will become apparent upon reading the following detailed 
description and upon reference to the accompanying drawings in which: 

Figure 1 is a block diagram of a superscalar microprocessor including an instruction reroute unit 

Figure 2 is a block diagram illustrating more detailed portions of the superscalar microprocessor of Figure 

1. 

While the invention is susceptible to various modifications and ahemative forms, specific embodiments 
thereof^ shown by way of example in the drawings and will herein be described in detail. It should be 
understood, however, that the drawings and detailed description thereto are not intended to limh the invention to 
the particuhir foim disclosed, but on the contrary, the intention is to cover all modifications, equivalents and 
alternatives fidlmg witiiin the spirit and scope of the present invention as defined by the appended claims. 

DETAILED DESCRIFHON OF THE INVENTION 

Turning now to Figure 1 , a block diagram of one embodiment of a superscalar microprocessor 30 is 
shown. Microprocessor 30 includes a bus interlace unit 32 coupled to an external bus 34, an instruction cache 36, 
and a data cache 38. Data cache 38 is coupled to a load/store unit 40. Instruction cache 36 is coupled to an 
instruction decode unit 42, which is coupled to a plurality of execute units mcluding execution units 44A and 44B 
(collectively referred to as execution units 44), load/store unit 40, a reorder buffer 48, and a register file SO. 
Execution units 44 and load/store unit 40 are also coupled to reorder buffer 48. An instruction reroute unit 60 is 
finally shown coupled to execution units 44. 

Generally speaking, instructions are fetched from mstruction cache 36 and conveyed to instruction decode 
uiiit 42 for decode, operand fetch, and dispatch. Instruction decode unit 42 decodes each instruction in order to 
determine which of execute units 44 or load/store unit 40 are configured to execute the instruction. Instruction 
decode unit 42 dispatches the instrucdon to a unit which Is configured to execute the instruaion and has resources 
to execute or store the instruction at the time the instruction is decoded. Additionally, register operands (i.e. 
operands stored in register file SO) are decoded from the instruction in order to convey operand requests to register 
file SO and reorder buffer 48. 

Each of the execution units 44 is configured to execute various instructions. In one embodiment, 
execution units 44 are symmetrical execution units. Symmetrical execution units are each configured to execute the 
same subset of the mstiuetion set employed by superscalar microprocessor 30. For example, symmetrical execution 
unhs may each be configured to execute insouctions within the X86 instructions set except for load/store memory 
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operations. These iostnictions may include arithmetic operations » shift operations, and branch operations. In 
another embodiment, exccitdon units 44 may be configured in an asymmetrical fashion, wherein ceitam instructions 
may be executed by one execution unit but not the other. At least a subset of instructions, however, may be 
executed by either execution unit 44, as will be explained in further detail below. It is also understood that 
additional execution units may be employed, such as a floating point unit or a dedicated branch unit 

For the embodiment of Figure 1, each execution imit is further configured to execute certain defined DSP 
instructions. DSP instructions may include highly optimized mathematical functions. For example, a multiply and 
accumulate operation is a DSP function. DSPs often include a multiply and accumulate function which multiplies a 
pair ofoperands together and adds the product to a third operand. The Uiird operand tnaymamtain an 
accumulation of prior muUipUcations. The multiply and accumulate function is useful in maiiy tunnericaUy 
intensive applications such as convolution and numerical imegration. Additionally, the DSP instructions supported 
by microprocessor 30 may further be optimized to repetitively operate upon a large number of operands stored 
contiguously in a memory. For such instnicdons. the memory may be accessed via a pair of pointer registers and 
the pointer registers iruiy be incremented or decremented concurrently. 

instraction cache 36 is configured to store a plurality of lines of instructions prior to their execution by 
microprocessor 30. It is noted that instruction cache 36 may be configured in a set-associative or direct-mapped 
configuration. Multiple histxuctions are fetched from instruction cache 36 and conveyed to instruction decode urut 
42 during a clock cycle. In one embodiment, instruction cache 36 includes an instruction fetching mechanism 
which selects fetch addresses for fetching instructions. The instruction fetch mechanism nuiy fetch instructions 
subsequent to those fetched in a previous clock cycle. In addition, instructions may be fetched fiom the predicted 
target of a branch instruction. A brarich prediction mechanism oiay be included within instruction cache 36 fbr 
performing branch prediction. Any branch prediction mechanism may be used by instruction cache 36. Finally, 
insttuctions may be fetched according to a mispredicted branch instruction or an exception. 

Load/store unit 40 is configured to execute load and store memory operations. Since load/store unit 40 
performs load and store memory operations which access a memory address, load/store unit 40 is coupled to data 
cache 38. Additionally, load/store unit 40 detects memory dependencies between addresses accessed and modified 
by various instructions. 

Execution units 44 and load/store unit 40 each mchide one or more reservation stations for storing 
dispatched instructions prior to the execution of those instroctions. One or more operands for an instruction may 
not be available, causing a delay in executing the histruction. Additiorudly, the uiut may execute another 
instruction provided to the unit in a previous clock ^le, causing a delay in executing a subsequent instruction, 
instructions remain m the reservation station until operands become available, at ^ich time the instruction 
becomes eligible for execution. A second instruction which is subsequeiit to a first instruction in program order 
may execute out of order witii die first instruction if the second instruction receives its operands prior to die first 
instruction. 
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Microprocessor 30 suppom out of order execution. Reorder buffer 48 is included to ensure that 
instructions are executed such that they calculate the same results as when the instructions are executed entirely in 
order. Reorder buffer 48 provides dependency dieddng, register renaming, mispredicted branch recoveiy, etc. 
When an instruction is decoded and dispatched by instruction decode unh 42, temporary storage is allocated whhin 
reorder buffer 48 for the results of the instruction. A tag identiiying the storage location is assigned to the 
instruction. It is noted that instructions arc decoded and dispatched in program order, allowing reorder buffer 48 to 
aUocate'storage locations for each instniaion in program order. Reorder buffer 48 therefore tracks the original 
program order of instructions, regardless of the order in which the instructions are actually executed in execution 
units 44 and load/store unit 40. For simplicity, execution units 44 and load/store tmit 40 will be collectively 
referred to herein as functional units. 

For each instraction whidi utilizes register operands, a request is made by instructi<ni decode unit 42 to 
reorder buffer 48 and register file 50 for the operand. If reorder buffer 48 is storing an instruction which updates 
the requested register, then reorder buffer 48 provides either ( 1 ) the operand, if the operand has been produced via 
execution of the mstruction, or (2) a tag identifynig the reorder buffer location to store the operand, if the operand 
has not yet been produced If reorder buffer 48 is not storing an instruction which updates the register, then the 
operand value is provided by register file SO. Register file 50 includes storage locations for storing the value of 
each register defined by the microprocessor architecture employed by microprocessor 30. It is noted that, when a 
functional unh transmits a result to reorder buffer 48, the tag identifying the uistruction being executed is also 
transmitted. The resuh is received by reorder buffer 48 and stored in the storage location indicated by the 
corresponding tag. Additionally, instructions which are within functional units or reservation stations awaiting the 
resuU may detect the tag conveyed to reorder buffer 48 and capture the result as it is conveyed. This technique is 
often refisrred to as "resuk forwarding". It is noted that, in cases where reorder buffer 48 is storing more than one 
update to a particular register, the tag or value associated with the last mstructions (in program order) is conveyed 
in response to the request 

Reorder buffer 48 retires mstructions (i.e. stores die results into register file 50 and deletes the 
instructions) in program order. An taistniction may be retired vAiai each instruction whhin reorder buffer 48 v^ch 
is prior to that mstruction in program order has been executed and is ready for retirement In this manner, 
instructions which are speculatively executed and later found to be mcorrectly executed may be discarded prior to 
updatmg register file 50. For example, an mstruction may be subsequent to a branch mstruction. If tiie branch 
bstruction is found to be mispredicted via execution of tiie branch instraction by a fimctional unit, then the 
subsequent mstruction may be part of a code sequence which b not mtended to be executed. Because the 
instruction has not updated register file 50, it may be discarded from reorder buffer 48 and the instruction will 
appear to have never executed. Instructions subsequent to an uistruction v^ch causes an exception may be 
handled sbnilarly. It is noted that faistnictions wbidi uiclude a store memoiy access may not update register file 50, 
but do not perform theh* store memory accesses until the instructions are otherwise ready for retirement in reorder 
buffer 48. In one embodhnent, reorder buffer 48 conveys tags of instructions including a store memory access 
which are nady for retirement to load^store imit 40. Load/store unit 40 then performs the correspondmg store 
memory accesses and the mstructions may be retired. 



wo 97/25671 PCT/US96/20045 



Data cache 38 is a high speed cache memcny configured to store data accessed by microprocessor 30. It is 
noted that data cache 38 may be configured as a 8et*associative or direct-mapped cache. 

S Bus interface unit 32 is included to effect communications between microprocessor 30 and other devices 

within a computer system employing the microprocessor. The other devices may be coupled to external bus 34, or 
coupled to another device or devices which are coupled to external bus 34. In pardcuiar, insouction cache 36 
conmiunicates instruction fietch addresses which miss instruction cache 36 to bus inter&ce unit 32. Similarly, data 
addresses which miss data cache 38 are conveyed to bus interface unit 32. Bus mterface unit 32 is configured to 

1 0 commmdcate the addresses to a main memory upon external bus 34, and to convey the data or instructions received 
to data cache 3 8 and Instruction cache 36, re^)ectively. Sdn further, bus interftce unit 32 receives cache Imes 
discarded from dau cache 38 which have been modified widi respect to main memory. Bus inter&ce unit 32 
transfers such cache lines to main memory via external bus 34. . 

1 5 As will be described in greater detail below in conjunction widi Figure 2, instruction reroute unit 60 is 

configured to determine a latency associated with an instruction that is eligible for execution within a reservation 
station of a particular execution unit 44 before it can actually begin execution. More specifically, the instruction 
reroute unit 60 receives a sigiud from execution unft 44 A that indicates whether the execution unit is going to take 
more than a predetermined number of clock cycles, such as S clock i^cles, to complete execution of the currently 

20 executing mstruction. Instruction reroute unit 60 is further configured to determine the number of clock cycles that 
other eligible instrucdons awaiting execution withm the reservation station unit will take before they can begin and 
ultimately complete execution. If an instruction eligible for execution within the reservation station unit of 
execution unit 44A must wait longer than a predetermined number of clock cycles before its executioii can begin, 
the instruction reroute unit 60 advantageously reroutes the instruction to execution unit 44B, provided that the 

25 instructions executing or awaiting execution within execution unit 44B will not take greater than a predetermined 
number of cycles to complete execution. Instruction reroute unit 60 is sirnilarly configured to reroute instructions 
from tfie reservation stations of execution unit 44B to execution unit 44A. 

Turning next to Figure 2, a block diagram is shown that illustrates further details of the microprocessor 30 
30 of Figure 1. Chcuit portions that correspond to those of Figure 1 are numbered identically for simplicity and 
' clarity. 

In the illustrated configuration of Figure 2, execution unit 44A mcludes execution bgic circuit 
61 A and a group of reservation statiois 62-1 through 62-3 (referred to collectively as reservation station unit 62). 
35 Execution unit 44B similarly hchides execution logic cireoit 61B and reservation stations 63-1 tiirougb 63-3 
(refened to collectively as reservaticm station unit 63). 

Execution logic ch«uit 61 A may be configured to execute the sanie set of ixistjuctioos as 
execution logic circuit 6 IB or may be configured to execute a subset of insttuctions that are executable by 
40 execution logic 61B, and vice versa. Instruction decode uiiit 42 decodes the mstnictions and dispatches the 



7 



wo 97/25671 



PCTAJS96/20(M5 



decoded instructions to either execution unit 44A or execution unit 44B, depending on the type of instruction (if the 
instruction is executable by only one of the units) and based upon the availability of empty entries within the 
associated reservation stations (if the instiuction is executable by either of the units 44A and 44B). Once an 
instructioti has been stored within one of the reservation station units 62 or 63, the instruction will become eligible 
for execution when the instruction's operands are available. Upon eligibility, the instruction along with its 
operands are passed to the corresponding execution logic circuit 61. It is noted that the reservation stations 
associated with each executioD unit 44 may be config;ured to provide insiructioiis to the execution logic 6 1 A in 
order, or may be configured to provide any eligible instruction to the associated execution logic circuit 61. 

During operation, instruction reroute unit 60 receives a signal from execution logic 61 A at line 70 
indicative of whether the cunent instnictioa being executed by execution togic unit 61 A will take more than a 
predetermined number of clock cycles before completion. In one embodiment, the predetetmiined number of clock 
cycles is five (i.e., execution logic circuit 61 A drives line 70 high if a cuirently executing instiuction unit will take 
more than five clock cycles before completion). Instruction reroute 60 receives a similar signal from execution 
logic unit 61B at Ime 71 . 

For the embodiment of Figure 2, an eligible instruction witfam any of the reservation stations of 
reservation station unit 62 may be provided to execution logic unit 61 A, even though earlier instructions (in 
program order) are waiting for operands. If more than one instruction wi^ the resetvation stations unit 62 are 
eligible for conveyance to execution logic circuit 61 A upon a particular clock cycle, the oldest instruction is 
provided to the execution logic unit 61A (i.e.> upon completion of a previously executmg histruction within the 
execution logic unit 61A). For example, if an instruction pending within reservation station 62-1 is eligible for 
execution and earlier dispatched instructions within reservation stations 62-2 and 62-3 are still waiting their 
operands, the instruction widiin reservation station unit 62- 1 is conveyed to execution logic circuit 6 1 A 
immediately after a previously executing instiuction within the execution logic circuit 6 1 A completes. If an 
instruction within reservation station 62-1 and an instruction within reservation station 62-3 are eligible for 
execution, the earlier-dispatched instruction will be provided to execution logic circuit 61 A upon completion of tiie 
previously executing msmiction* 



In addition to the signal at Une 70 generated by execution logic circuit 6U and indicative of the number of 
clock cycles before completion of a currentiy executing instruction within executioQ logic circuit 61 A, instruction 
reroute unit 60 also receives infonnation regarding the instructions pending within reservation stations unit 62. 
Usnig this information, instruction reroute unit 60 is configured to determine whether, for an instruction wluch is 
eligible for execution within reservation station unit 62, tiie number of clock cycles before it can actually bi^ 
execution exceeds a certam number. This determination is based upon the number of clock cycles required to 
complete the currently executing instruction in execution logic circiiit 61 A and any other eligible instructions in the 
reservation station unit 62 awaiting conveyance to the execution unit If the number of cycles before execution of 
the eligible instruction can begm exceeds a predetermined threshold, and if the number of cycles required to 
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complete execution of a currently executing instruction within execution logic circuit 6 IB and to execute any other 
eligible instructions within reservation stations unit 63 does not exceed a second predetermined threshold, the 
eligible instruction within the reservation stations unh 62 of execution unit 44A is rerouted tiuibugh instruction 
reroute unit 62 to an available reservation station of execution unit 44B. 

5 

For example, consider a situation wherein an eligible instruction within reservation station 62-1 must wait 
for other earlier-dispatched eligible instructions within reservation stations 62*2 and 62*3 before It can begin 
execution. The eligible instruction within reservation station 62-1 must further wait for the currently executing 
instruction within execution logic circuit 61 A to complete. In this situation, if the instruction reroute unit 60 

1 0 determmes that the number of clock cycles required to con^)lcte the execution of the currently executing insnuction 
within execution logic circuit 61 A and the number of clock cycles to execute die instructions within reservaticm 
stations 62-2 and 62-3 will take more than a predetermined threshold number of clock cycles, the instruction 
reroute unit 60 wiU transfer the eligible instruction within reservation station 62-.1 to execution unit 44B, provided 
diat there is an available reservation station within reservation station unit 63, and provided that the currently 

i 5 executing instruction withh the execution logic circuit 61 B and any earlier-dispatched, eligible histructions within 
reservation stations unit 63 does not exceed a second predetermined threshold. Similar transfers nuiy be 
effectuated for instructions widiin reservation stations 62-2 and 62-3, preferably in program order. Typically, the 
first predetermined threshold (to qualify an eligible instruction for transfer to anodier execution unit) is greater than 
- the second predetermined threshold which is used as an condition to prevent the transfer of an instruction. 

20 Additionally, if all instructions within the reservation stations of reservation station unit 63 are awaiting operands, 
an eligible instruction in reservation station unit 62 may be transferred by instruction reroute unit 60 and provided 
directly to execution logic circuit 6 1 B for execution in the next clock cycle. 

Since the instruction reroute unit 60 serves to transfer eligible instructions from a heavily-burdened 
25 execution unit to an execution unit which is not as heavily burdened, stalling conditions within the microprocessor 
30 may be avoided and instructions may be executed more expeditiously. Accordingly, the overall perfomiance of 
microprocessor 30 may be improved. 



30 Numerous variations and modifications will become apparent to those skilled in the art once the above 

disclosure is fully appreciated. It is intended that die following claims be interpreted to embrace all such variations 
and modifications. 
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1 . A superscalar microprocessor comprising: 

a first execution logic circuit configured to execute a predetemiined set of instructions; 

a first reservation station unit coupled to said first execution logic circuit and configured to store a pending 
instruction to be executed by said first execution logic circuit; 

a second execution logic circuit configured to execute said predetennined set of instructions; 

a second reservation station unit coupled to said second execution logic circuit; and 

an instruction reroute unit coupled to said first and second execution logic units and to said first and 
second reservation station imits and configured to reroute said pending instruction to be executed by said first logic 
unit to said second reservation station unit in response to said instruction reroute unit determining that said pending 
instruction must wait for more than a predetermined number of clock cycles before said pending instruction can 
begin execution within said first execution logic unit 

2. The 8i^)erscalar microprocessor as recited in Claim 1 wherein said first execution logic circuit is 
configured to generate a signal indicative of a number of clock cycles before completion of a currently executing 
instruction. 

3 . The superscalar microprocessor as recited in Claim 2 wherein said signal generated by said first 
execution logic circuit is provided to said instruction reroute unit 

4. The st^ierscalar microprocessor as recited in Claim 3 wherein said instruction reroute unit is 
configured to determine that said pending instruction must wait for more than said predetermined number of clock 
cycles based upon said signal generated by said first execution logic circuit 

5. The superscalar microprocessor as recited in Claim 4 wherein said instruction reroute tmit is 
finther configured to determine that said pending instruction must wait for more than said predetermined number of 
clock cycles based upon a minimum number of clock cycles to execute other eligible instructions stored within said 
first reservation station unit. 

6. The superscalar microprocessor as recited in Claim S wherein said other eligible mstructions are 
ahead of said pending instruction with respect to program order. 

7. The superscalar microprocessor as recited hi Claim 1 wherein said histruction reroute unit is 
finther configured to detemUne a number of clock cycles before said pending instruction can be executed by said 
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second execution logic circuit. 

8. A superscalar microprocessor comprising: 

5 a jfirst execution logic circuit configured to execute a predetemined set of instructions; 

a fust reservation station unit coupled to said first execution logic circuit and configured to store a pending 
instruction to be executed by said first execution logic circuit; 

1 0 a second execution logic circuit configured to execute said predetomined set of instnictions; 

a second reservation station tmtt coupled to said second execution logic circuit; and 

an instruction reroute unit cot^led to said first and second execution logic imits and to said first and 
1 5 second reservation station units and configured to determine whether said pending instruction must wait for more 
than a predetertnined number of clock cycles before said pending instruction can begin execution within said first 
execution logic unit and configured to reroute said pending instruction to said second reservation station in 
response to said instruction reroute unit determining that said pending instruction must wait for more than said 
predetermined number of clock cycles before said pending instruction can begin execution within said first 
20 execution logic unit 

9. The superscalar microprocessor as recited in Claim 8'wherein said first execution to^^ 
configured to generate a signal indicative of a number of clock cycles before completion of a currently executing 

25 instruction. 

1 0. The superscalar microprocessor as recited in Claim 9 wherein said signal generated by said first 
execution logic circuit is provided to said instruction reroute unit. 

30 11.. The superscalar microprocessor as recited in Claim 1 0 wherein said instruction reroute unit is 

configured to d^eimine that said pending instruction must wait for more than said predetermined minimum number 
of clock cycles based upon said signal generated by said first execution logic circuit 

12. The superscakur microprocessor as recited in Claim 1 1 wherein said instruction reroute tinit is 

35 further configured to determine that said pending instruction imist wait for more than said predetermfaied number of 
clock cycles based upon a number of clock cycles to execute other eligible instnictions stored within said first 
reservation station unit 

1 3 . The superscalar microprocessor as recited in Claim 12 wherein said other eligible instructions are 
40 ahead of said pending instruction with respect to program order. 
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14. The supascalar micftiprocessor as recited in Claim 8 
further configured to determine a number of clodc cycles before said pending instruction can be executed by said 
second execution logic circuit 
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